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Abstract 

The paper deals with the problem of penalized empirical risk minimization over 
a convex set of linear functionals on the space of Hcrmitian matrices with convex 
loss and nuclear norm penalty. Such penalization is often used in low rank matrix 
recovery in the cases when the target function can be well approximated by a linear 
functional generated by a Hermitian matrix of relatively small rank (comparing with 
the size of the matrix). Our goal is to prove sharp low rank oracle inequalities that 
involve the excess risk (the approximation error) with constant equal to one and the 
random error term with correct dependence on the rank of the oracle. 



1 Main Result 



Let (X, Y) be a couple, where X is a random variable in the space H m ofmxm Hermitian 
matrices and Y is a random response variable with values in a Borel subset Tci. Let 
P be the distribution of (X, Y) and let II denote the marginal distribution of X. The 
goal is to predict Y based on an observation of X. More precisely, let I : T x R i— y R + 
be a measurable loss function. We will assume in what follows that, for all y £ T, 
£(y; •) is convex. Given a measurable function / : M m h-> M. (a "prediction rule"), denote 
(£ • f)(x, y) := £(y; f(x)) and define the risk of / as 

P(£.f)=E£(Y;f(X)). 
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Then, one can view the prediction problem as risk minimization: the goal is to find a 
function /* : M m i— > R that minimizes the risk P(£ • /) over the class of all measurable 
prediction rules / : H m h-> E (provided that such a function exists), or, more realis- 
tically, to find a reasonably good approximation of /*. To this end, one wants to find 
a function / for which the excess risk £ (/) := P(l • f) — inf 9: H m h->R P(i • g) is small 
enough. Of course, the risk P(l • /) depends on the distribution P of (X, Y), which is, 
most often, unknown. In such cases, the problem has to be solved based on the training 
data (Xi, Yi), . . . , (X n ,Y n ) that consists of n independent copies of (X,Y). We will be 
especially interested in the problems in which matrices are large and the optimal predic- 
tion rule /* can be well approximated by a linear function fs('^) ■ — •), where S G Harr- 
is a low rank Hermitian matrix, that is, when there exists a low rank matrix S (an 
oracle) such that the excess risk £(fs) is small. Here and in what follows, (•, •) denotes 
the Hilbert-Schmidt (Frobenius) inner product in IHt^. In such problems, we would like 
to find an estimator S based on the training data (Xx,Yi), . . . , (X n ,Y n ) such that the 
excess risk £(f§) of the estimator can be bounded from above by the excess risk £(fs) of 
an arbitrary oracle S £ M m plus an error term that properly depends on the rank of the 
oracle. The resulting bounds on the excess risk £{f§) of the estimator S are supposed 
to hold with a guaranteed high probability and they are often called "low rank oracle 
inequalities." We will consider below rather traditional estimator S based on penalized 
empirical risk minimization with a nuclear norm penalty: 

S := argmin 5eID) P n (£ • fs) + s\\S\\i , (1.1) 

where B C IH m is a closed convex set, G B, P n is the empirical distribution based on 
the training data (Xi, Y\), . . . , (X n ,Y n ) and 



n 



P n (t»fs)=n- 1 Y,t(Y j -Js(X J )) 

J'=l 

is the corresponding empirical risk with respect to the loss i, \\S\\i := tr(| 5*| ) = tr(\^S^) 
is the nuclear norm of S and e > is the regularization parameter. Clearly, optimiza- 
tion problem (jl.ip is convex. In fact, it is a standard convex relaxation of penalized 
empirical risk minimization with a penalty proportional to the rank of S, denoted in 
what follows by rank(S'), which would not be a computationally tractable problem. Such 
convex relaxations have been extensively studied in the recent years (see Recht, Fazel 
and Parrilo (2010), Candes and Recht (2009), Candes and Tao (2010), Candes and Plan 
(2011), Gross (2011), Rohde and Tsybakov (2011), Negahban and Wainwright (2010), 
Koltchinskii (2011), Koltchinskii, Lounici and Tsybakov (2011) and references therein). 
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To state our main result (a sharp low rank oracle inequality for the estimator S) , we 
first introduce some assumptions and notations. In what follows, assume that for some 
constant a > 0, \{S, X)\ < a a.s., S G B. It will be also assumed that £ is a convex loss 
of quadratic type. More precisely, suppose that, for all y G T, £(y, •) is twice continuously 
differentiable convex function in [—a, a] with Q := sup y£T £(y;0) < +00, 



and 



L(a) := sup sup \£'(y;0)\+£ (y;u)a 

y&T ne[-a,a] L 



r(a) := inf inf £ (y;u) > 0. 

yeTu€[-a,a] 



< +OO 



Here £ ,£ denote the first and the second derivatives of the loss £(y,u) with respect 
to u. Many important losses in regression and in large margin classification problems 
are of quadratic type. In particular, if £(y;u) = (y — u) 2 ,y,u G [—a, a] (regression with 
quadratic loss and with bounded response), then L(a) = 4o and r(a) = 2. Exponential 
loss £(y, u) = e~ yu , y G {—1, 1}, u G [—a, a] often used in large margin methods for binary 
classification is also of quadratic type. 

In what follows, || • H2 denotes the Hilbert-Schmidt (Frobenius) norm of Hermitian 
matrices (generated by the inner product (•,-)) and || • || denotes the operator norm. 

We will use certain characteristics of matrices S G D that are related to matrix ver- 
sions of restricted isometry property (see, e.g., Koltchinskii (2011), Chapter 9 and refer- 
ences therein). Let S G D be a matrix with spectral representation S = Y^j=i ^ji^j^^j)-, 
where r := rank(S'), Xj are non-zero eigenvalues of S (repeated with their multiplicities) 
and 4>j G C m are the corresponding orthonormal eigenvectors. In what follows, we denote 

r 

sign(S') := ^sign(A i )(^ i <g> fa), L := supp(S') := l.s.(<£i, . . . , fa). 

Let Vl^V^ be the following orthogonal projectors in the space (H m , (-, •)) : 

Tl{A) :=A- P l ±AP l± , Vi{A) := P L ±AP L± , A G H m 

(here L 1 - is the orthogonal complement of L). Clearly, we have A = VlA+V^A, A G EI m , 
providing a decomposition of a matrix A into a "low rank part" VlA and a "high rank 
part" V^A. Given b > 0, define the following cone in the space M m 

JC(B;L;b) := {a G l.s.(D) : ||^(^)||i < b\\V L {A)h) 
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that consists of matrices A with a "dominant" low rank part. Let 



/3( fc )(D;L;n) := inf |/3 > : \\V L {A)\\ 2 < 



A\\L 2 {n), 



A e 



/C(B;L;6)}. 



This quantity is known to be bounded from above by a constant in the case when the 
matrix form of "distribution dependent" restricted isometry condition holds for r = 
4rank(S') (see Koltchinskii (2011), Section 9.1). In what follows, we will use the following 
characteristic of oracle S : 

P{S) := /3( 5 )(lD);L;n), L : = supp(S'). 



For arbitrary t > and S G D, denote 

t(S;e) :=t + 31og( J Blog 2 (j|S'||i VnVeVQVa^V {L(a)Y l V2 
where B > is a constant. Let 

1 - 



A := E 



3=1 



where {ej} are i.i.d. Rademacher random variables independent of {^j}- 

Theorem 1 There exist a numerical constant B > in the definition of t(S; e) and 
numerical constants C,D > such that for all t > and all 

DL(a)A 



e > 



n 



with probability at least 1 — e *, 



5eB 



" /3 2 (5)rank(5)e 2 /\2e||S|| 1 ) +C(a) t(Ci;:i 



TO 



1.2) 



(1.3) 



where 



C(a) := C 



U(a) 



V^(a)« 



To control the size of expectation A involved in the threshold (|1.2p on e one can use 
a noncommutative version of Bernstein inequality due to Ahlswede and Winter (2002). 
Namely, the following upper bound easily follows from this inequality (by integrating its 
exponential tail bounds): 

A< 4 (^ V ^(2^V^ ! ^) 5 
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. This bound can be easily applied to various 

L 



where a\ := \\EX 2 \\ and U x ■= \\X\ 
specific sampling models used in low rank matrix recovery, such as sampling from an 
orthonormal basis that includes, in particular, matrix completion (see, e.g., Koltchinskii 
(2011), Chapter 9) leading to more concrete results. 

The main feature of oracle inequality fjl .3|) is that it involves the approximation 
error term £{fs) (the excess risk of the oracle S) with constant equal to 1. In this sense, 
bound (jl.3p is what is usually called a sharp oracle inequality. Most of low rank oracle 
inequalities for the nuclear norm penalization method proved in the recent literature are 
not sharp in the sense that the oracle excess risk £(fs) is involved in these bounds with 
a constant strictly larger than 1. Sharp oracle inequalities are especially important in 
the cases when for all oracles in S G B the approximation error is not particularly small. 
The first sharp oracle inequalities for nuclear norm penalization method were proved 
in Koltchinskii, Lounici and Tsybakov (2011). It was done for a "linearized version" of 
least squares method with nuclear norm penalty. Under the boundedness assumption 
\{S, X}\ < a a.s., S G B for some a > (the same assumption is used in our paper), 
Klopp (2012) proved error bounds (without approximation error term) for the usual 
matrix LASSO (that is, nuclear norm penalized least squares method). Earlier, Negahban 
and Wainwright (2010) studied the same problem under additional assumptions on the 
so called "spikiness" of the target matrices. Koltchinskii and Rangel (2012) stated a 
sharp oracle inequality for the same method in the case of noisy matrix completion 
problem with uniform design (in fact, they deduced this result from more general oracle 
bounds for estimators of low rank smooth kernels on graphs). In the current paper, we 
establish sharp oracle inequalities for a version of the problem with more general losses 
of quadratic type and for general design distributions. Note also that the main part of 
the random error term of bound (II. 3D (that is, the term -^ i -/3 2 (S')rank(S')e 2 A 2e||5||i) 
depends correctly on the rank of the oracle. This follows from the minimax lower bounds 
proved in Koltchinskii, Lounici and Tsybakov (2011) (in fact, the form of the random 
error term in (|1.3|) is the same as in that paper). 

2 Proof 

We start with the following condition that is necessary for S to be a solution of convex 
optimization problem (| 1 . 1 [) : for some V G 

Pn(i' • h)(h ~ fs) + e(V, S - S) < 0, S G B 
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(see, e.g., Aubin and Ekeland (1984), Chap. 2, Corollary 6; see also Koltchinskii (2011), 
pp. 198-199). This implies that, for all S G B 

p (* • fs)U§ ~ fs) + e{V, S-S)<(P- P n ){£' . f s )(f § - f s ). (2.1) 

Since both S,S £ B, we have |/a(X)| < a, \fs(X)\ < a a.s., and since t is a loss of 
quadratic type, it is easy to check that 

p {Z' • f§)(fs - fs) > P(^f§) - P(t'fs) + \r(a)\\f § ~ fs\\l 2{n y (2-2) 

If P(£»fg) < P(£»fs), the oracle inequality of the theorem holds trivially. So, we assume 
in what follows that P(£»f§) > P(£»fs). Inequalities flZTJ) and (f!T2j) imply that 

P(^fs)+^(a)\\f § -f S \\l 2{n) +s(V,S-S) < P(e.f s ) + (P-P n )(£>.f § )(f § -f s ). (2.3) 

The following characterization of subdifferential of the nuclear norm is well known: 

= {sign(S) + Vi(M) : M G M m , ||M|| < 1}, 

where L = supp(S') (see, e.g., Koltchinskii (2011), Appendix A. 4). By the duality between 
the operator and nuclear norms, there exists M G M m with ||M|| < 1 such that 

(Vj;(M), S-S) = {M,Vi{S - S)) = \\Vi{S - 5)||! = H^Hl 

Then, by monotonicity of subdifferentials of convex functions, we have, for V = sign(5) + 
Vi{M) G a||5||i, that 

(sign(5), S-S) + \\ViSh = (V, S - S) < (V, S-S). 
We now substitute the last bound in (|2.3h to get 

P(i • f§) + \r(a)\\f § - f s \\l m + e\\v£S\\i 

< P{£ . f s ) + e(siga(S), S - S) + (P - P n )(£' • f s )(f s - f s ). (2.4) 

The main part of the proof is a derivation of an upper bound on the empirical 
process (P — P n ){£' • f§){f§ ~ fs)- For a gi ven <5 G B and for Si, 5 2 > 0, denote 

A(5 h S 2 ) :={AGB:A-5G/C(B;L;5),||/ A -/5lU 2 (n) < M^lli < 
A(5i,5 2 ,h) ■= {A G B : \\f A - f s \\ Lm < Si, \\ViA\\ x < 5 2 , \\V L (A - < 5 3 }, 
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JLfaSi) := {A e B : \\f A - fsh 2 (u) < S lt \\A-S\\i <S A }, 

and 

a^, S 2 ) := sup{|(P n - P)(t' • f A )(f A — fs)\ ■ A G A(5 1 ,5 2 )}, 
&n(6i,8 2 , 8 2 ) := sup{|(P n - P)(t' • / A )(/ A -f s )\:Ae A(8 1: S 2: 6 3 )}. 
a n (h,h) := sup{|(P n - P)(£' • f A )(f A - f s )\ : A € JL(5i, <5 4 )}. 

Lemma 1 Suppose < 5^ < 6~£, k = 1, 2, 3,4. Lei i > and 

2 

t := i + ^ log([log 2 (5+/J-)] + 2) + log 3, 

k=l 

3 

t := t + ^ log([log 2 (5+/J-)] + 2) + log 3. 
fe=l 

i-t+Y, log(pog 2 (<5+/<£)] + 2) + log 3. 

fc=l,4 

Then, with probability at least 1 — e - *, /or a// 5k £ [5^7 , <$jjT], = 1,2,3 

an(5i, <5 2 ) < 2CiL(a)E||S||(Vrank(5)/3(5)(5i + <5 2 ) + 4L(a)<W - + 4L(a)a-, (2.5) 

\ n n 



a n (Si,S 2 , 6 3 ) < 2C 2 L(o)E||3|| (<5 2 + 6 3 ) + 4L(a)<5i W - + 4L(a)a-, (2.6) 



n n 



and 



a n (5i,5 4 ) < 2C 2 L(a)E\\E\\5 i + AL(a)5i\ - + AL(a)a-, (2.7) 

V n n 

where C\,C 2 > are numerical constants. 

Proof. We will prove in detail only the first bound (|2.5p . Talagrand's concentration 
inequality (in Bousquet's form, see Koltchinskii (2011), p. 25) implies that, for all Si, 5 2 > 
0, with probability at least 1 — e _< 

a n (Si,S 2 ) < 2Ea n ((5i,a 2 ) + 2L(o)(5iJ- + 4L(o)a- ) 

V n n 

where we also used the bounds 

• fA)(f A ~ fs)\ < 2L(a)a, P(£' . f A f(f A - f s ? < L 2 (a)\\f A - / s ||| 2(n) < L 2 (a)5l 



that hold under the assumptions on the loss. The next step is to use standard Rademacher 
symmetrization and contraction inequalities (see, e.g., Koltchinskii (2011), sections 2.1, 
2.2) to get 

Ean(*i,<&2) < 16L(a)E S up{\R n (f A -fs)\ : A e A(S 1 ,S 2 )}, (2.8) 

where R n (f) '■= J2j=i £ jf(Xj), {Ej} being i.i.d. Rademacher random variables indepen- 
dent of {(Xj, Yj)} and where we also used a simple fact that the Lipschitz constant of 
the function u \-t i'(fs + u)u is upper bounded by 4L(a). We will bound the expected 
sup-norm of the Rademacher process in the right hand side of (|2.8|) . Observe that 

n 

RuUa ~ fs) = (E,A-S), E := n~ l ]T ejXj, 

i=i 

which implies 

\Rn(fA -fs)\< \(VlE,Vl(A -S)\ + \{3,P£(A - 5)1 (2.9) 

< \\VlE\\ 2 \\V l (A- S)\\ 2 + ||H||||^|| 1 

< 2V2rank(S)/3(S)||S||||/ A - / s |U 2 (n) + [|H|| ||7^A|| l5 

where we used the facts that A — S E /C(D; L; 5) and also that 



rank(P L ~) < 2rank(S), \\V L E\\ 2 < 2 v / rank(P L H)||H||. 

Therefore, 



Esup{\R n (f A - f s )\ :AeA(6 1 J 2 )}<E\\E\\(2^at&(S)f3(S)6 1 + 6 2 ). (2.10) 
It follows that with some numerical constant C\ > and with probability at least 1 — e - *, 

u n {Sl,S 2 ) < CiL(a)E||H||( v /rank(5) j 9(5)<5i + 6 2 ) + 2L{a)6 x J - + AL(a)a-. (2.11) 

V n n 

We will make this bound uniform in 6k G [$k ^k~\ m ^° ^ ms en< ^' ^ ^fc := ^k^~ 3 >j = 
0, . . . , [log 2 (6£ /5^)} + 1. By the union bound, with probability at least 1 — |e - *, for all 
j fc = 0,...,[log 2 (5+/^)] + l,A: = l,2, 

a n (6{\6%) < CiX(a)E||S||( v /rank(5)/3(5)<5f +6%) + 2L(a)6{ 1 ^ + 4L(a)a^, (2.12) 
which implies that, for all 6k G [6^,6^],k = 1,2, 

u n (6i,6 2 ) < 2CiL(a)E||H||( v /rank(5)/3(S)Ji + 6 2 ) + AL{a)6 1 \j^ + AL{a)a^. (2.13) 
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The proof of the second and the third bounds is similar. For instance, in the case 
of the second bound, the only difference is that instead of (|2.9[) we use 

\Rn(fA ~ fs)\ < \m(\\V L (A - 5)||! + \\V£(A - S)h), (2.14) 
which yields (instead of (|2.10p ) 

Esnp{\R n (f A -f s )\ :AeA(8 l ,8 2 ,S 3 )}<E\\E\\(S 2 + 8 3 ). (2.15) 

□ 

Note that 

(P - P n ){g • f § )(f s - f s ) < a n (\\f § - fs\\ L2 (uy, WPtSWv, \\V L (S - 5)110, (2.16) 

(P - P n )(£' . f § )(f § - f s ) < a n (\\f s ~ fsh m ; \\S - S1|i), (2.17) 
and also, if 5 — 5 G /C(B; L; b), then 

(P - P n ){f . f s )(f § - f s ) < a n (\\f s - /5|U 2(n) ; H^lll). (2-18) 

Assume for a while that 

l|/s-/slU 2( n) e [^,St],\\V^\\ x G ^-,5+1,1^(5-5)11! G [<^,<S+]. (2.19) 

First, we substitute (|2.17p in bound (|2.3p and use the upper bound on a n of Lemma [TJ 
Observe also that, since V G <9||5||i, 

(l/,5-5) < ||5||i-||5||i. (2.20) 

Therefore, we get 

P{^f s ) + \^)\\fs-fs\\l 2{ n) (2-21) 

< P{1 . f s ) + £(11511! - ||5||i) + a n (\\f s ~ fsh 2 (uy, \\S - 5||x) 

< P(t • fs) + e(\\S\\i ~ \\S\\i) + 2C 2 L(a)E||H||||5 - 5||i 

+iL{a)\\f s -fs\\ L2( n)]/l + ^Ha)a^. 
Assume that the constant D in the condition on e satisfies D > 8C 2 - Then, we have 

e > PL(a)An" 1 / 2 > 8C 2 L(a)E||S||. (2.22) 
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Using the bound 

4L(«)[|/ a - fsKmJl - I T(a)ll/ * " fs ^w + 

we get from (|2.21j) 

P(i • /*) < P(* • /s) + - ||5|| a ) (2.23) 

+ e\\S-S\\ 1 + (^^ + 4L(a)a)- 

< P(£ . f s ) + 2e||5||i + f + 4L(a)a^| - 



r(a) / n 

We will now substitute (|2.16j) in bound (12. 4p and use the upper bound on a n of 
Lemma [TJ We will also bound (sign(S), S — S) as follows: 

|(sign(5),5-5)| = |(sign(5),7'L(5-5))| < ||sign(S)||||7> L (S - 5)||i < \\V L (S - S)^. 

(2.24) 

We get 

P(* « /s) + \r{a)\\fs ~ fs\\l m + e\\Vi{S - S)\U (2.25) 

< P{i . fs) + e ||^(5 - 5)||a + a n (\\f s - fsh m ; \\V£s\\i; \\V L {S - S)\\ x ) 

< P(i • f s ) + - 5)||i + 2C 2 L(a)E||H||(||7>f S\\i + ||7> L (S - S)h) 

+4L(a)\\f s - / s |U 2( n)v/| + 4L(a)al 
We still assume that D > 8C2 and, thus, (|2.22fl holds. Using the bound 

4L(a)\\f s - fs\\ Lm ]jl < \r(a)\\f § - f s \\l 2{n) + 



we get from (|2.25p 



1 



* /$) + l T ( a )Wfs ~ fsWUu) + 4Pl(S -S)h (2.26) 
•fs) 

8L 2 (a) \t 



< P(i . f 8 ) + e\\V L {S -S)\\ l + + ||Pl(5 -5)11! 



+ 4L(a)a -. 



r(a / n 



If 



SL_l " 1 + 4L(a)a) * > e ||*> £ (£ - 5)|U + f ( 1 1 T^ir ^ 1 1 a + 11^(5 - SJHi) 



r(a) / n 4 
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we conclude that 

Pit • f§) < P(l • fs) + + 8L(a)a) -, (2.27) 

which suffices to prove the bound of the theorem. Otherwise, we use the assumption that 
P(i • f s ) > P(£ • f s ) to get the following bound from (f2T26l) : 

4Pl(S - S)h < 2e\\V L (S - S)h + ~(\\V£(S -S)\\ 1 + \\V L (S - S)^). 



This yields 



l e \\v£{8-S)\\i < le\\V L (S - S)^, 



and, hence, S — S € /C(B; L; 5). This fact allows us to use the bound on a n of Lemma [TJ 
We can modify ()2.24p as follows 

\(sign(S),S-S)\ = \{sign(S),V L (S-S))\ (2.28) 

< ||sign(5)|| a [|7>r(S - 5)|| 2 < ^k(S)(3(S)\\f s - f s \\ L2{ u), 

and, instead of (I2.25|) . we get 

P(i • f S ) + ~T(a)\\f s - fs\\l 2{n) + e||^S||i (2.29) 

< P(£ . f s ) + e^k(S)l3(S)\\fs - fs\\ L2 (u) + 
2C 1 L(a)E\\E\\(^k(S)P(S)\\fs - /s|U 2 (n) + \WiS\\i) + 

+4L(a)\\f s -fs\\ L2i u)]fl + ^Ha)aL 
If D > 2Ci, we have e > 2CiL(a)E||S||, and (pT29|) implies that 

^•/5) + ^(«)ll/5-Ml! 2( n) (2-30) 



< Pi* • /s) + 2^y^ 2 (5)rank(S)^ + -r(a)||/ s - /s||£ 2(ri ) + 



2 -^/3 2 (5)rank(5)e 2 + ^r(a)||^ - f s f Lm + 
24L 2 (a) t 1 . ... , , ll2 ir . . t 



a- 



r(a) n 6 2 ^ n 

Therefore, we have 

Pit • f s ) < Pit • /s) + -^/? 2 (S)rank(S) £ 2 + + 4L{a)( \ * (2 31) 

° r(a) V r ( a ) / n 
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The bound of the theorem will follow from ([225]) . (pT2T]) and ([25T]) (provided that 
conditions (I2l9l) hold). 

We have to choose the numbers 5^ ,5t ,k = 1, 2, 3, 4 and establish the bound of the 
theorem when conditions (|2.19p do not hold. First note that, by the definition of 5, 

P n (*«S) + e||S||i < P n (i»0) < Q, 
implying that ||5||i < ®. Next note that 

||7^5||i = WP^SP^h < \\Sl\i < | 

and 

\\Vl{S - 5)||i < 2||5 - S\\x < ^ + 2||5||i. 
Obviously, we also have 

\\S-S\\i < j + 

Finally, we have \\f § - /s||L 2 (n) < 2a (since 5,5 E B and Wf^h^ < a,\\fs\\ Loo < a). 
Due to these facts, we can take 

^ ■= 2a, 5$ := |, # := ^ + 2||5|| 1)< 5+ := | + \\S\\ 1} 

and, with this choice, St , k = 1, 2, 3, 4 are upper bounds on the corresponding norms in 
(j2.19|) . We will also choose 

d 1 '.= —j=, d 2 := A (62/2), S 3 := A {6 J 2), o 4 := A (oJ/2). 

v/n ne ne ne 



It is not hard to see that 

iVt Vt < t(5;e) 

for a proper choice of numerical constant B in the definition of t(S; e). When conditions 
(|2.19p do not hold (which means that at least one of the numbers 5^ ,k = 1, 2, 3, 4 is not 
a lower bound on the corresponding norm), we still can use the bounds 

(p - p n )(£' . f 8 )(f s - f s ) < a n (\\f§ - /slk(n) v sp, ll^^lli v ^; \\r L (s - s)h v ^ 

(2.32) 

(P - P ft )(f • f s )(f s - f s ) < a n (\\f§ ~ fsh 2 (u) V ||5 - 5||a V o 4 ") (2.33) 
instead of (j2.16p . ([2.17P and, in the case when 5 — 5 G /C(B; L; 5), we can use the bound 

(P - P„)(/ • - / 5 ) < a n (\\f § - f s \\ L2{U ) V 5r; IIPl ^lli V 5 2 -) (2.34) 
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instead of bound (|2. 18j) . It is easy now to modify the proof of (|2.2ip - (|2.3ip to show that 
in this case we still have 



which holds with probability at least 1 — e * and implies the bound of the theorem. 
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P(i • f g ) < P(? • fs) + (^y/3 2 (5)rank(5)e 2 /\ 2e||5||i) 




□ 
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